ViSL model: The model automatically generates sentences of Vietnamese sign language
Annotation
The main problem in building intelligent systems is the lack of data for machine learning, which is especially important for sign language recognition for the deaf and hard of hearing. One of the ways to increase the amount of data for training is synthesis. Unlike speech synthesis, it is impossible to create a sequence of gestures in Vietnamese and some other languages that exactly repeat the text. This is due to the significant limitations of the gesture dictionary and the different word order in sentences. The aim of the work is to enrich the educational corpus of video data for use in creating recognition systems for the Vietnamese Sign Language (ViSL). Since it is impossible to translate the words of the source text into gestures one to one, the problem of translating from a regular language into a sign language arises. The paper proposes to use a two-phase process for this. The first phase involves pre-processing the text with standardization of the text format, segmentation of words and sentences, and then encoding the words using the sign language dictionary. At this stage, it should be noted that there is no need to remove punctuation marks and stop words, since they are related to the accuracy of the N-gram model. Next, instead of using syntactic analysis, a statistical method for forming a sequence of gestures is used, and the Markov model on the transition graph between words is taken as a basis in which the probability of the next word depends only on the two previous words. Transition probabilities are calculated on the existing marked corpus of the ViSL. The Breadth-first Search method is used to compile a list of all sentences generated based on a given grammatical rule and a matrix of semantic interactions between words. The inverse of the logarithm of the product of the probabilities of co-occurrence of consecutive 3-word phrases in a sentence is used to estimate the frequency of occurrence of that sentence in a given data set. Based on the ViSL data of 3,234 words, we calculated probability matrices representing the relationships between words based on Vietnamese natural language data with 50 million sentences collected from Vietnamese newspapers and magazines. For different grammar rules, we compare the number of generated sentences and evaluate the accuracy of the 50 most frequent sentences. The average accuracy is 88 %. The accuracy of the generated sentences is estimated by manual statistical methods. The number of generated sentences depends on the number of word parts that are labeled according to the grammar rules. The semantic accuracy of the generated sentences will be very high if the search words are labeled with the correct part-of-speech tagging. Compared with machine learning methods, our proposed method gives very good results for languages without inflections and word order that follow certain rules, such as Vietnamese, and does not require large computational resources. The disadvantage of this method is that its accuracy largely depends on the type of word, sentence, and word segmentation. The relationship of words depends on the observed dataset. Future research direction is to generate paragraphs in sign language. The obtained data can be used in machine learning models for sign language processing tasks.
Keywords
Постоянный URL
Articles in current issue
- Automatic sign language translation: a review of neural network methods for recognition and synthesis of spoken and signed language
Overview of routing algorithms for network on chip
- Gain characteristics of In0.60Ga0.40As/In0.53Al0.20Ga0.27As superlattice active regions for vertical-cavity surface-emitting lasers
- Change of optical properties of silver surface due to laser structuring
- Algorithm for navigation on the terrain of unmanned aerial vehicles with machine vision
- Development of a fiber-optic system for monitoring geotechnical structures
- Investigation of the characteristics of a semiconductor laser diode as a transceiver for fiber Bragg gratings interrogation
- Control of nonlinear plants with a guarantee for the controlled signal to stay within a given set under disturbances and high-frequency measurement noises
- Impact of solvent quality on tribological properties of polymer brushes
- Low-complexity multi task learning for joint acoustic scenes classification and sound events detection
- A method for optimizing neural networks based on structural distillation using a genetic algorithm
- Enhanced anomaly detection in network security: a comprehensive ensemble approach
- Enhancing attribute-based access control with Ethereum and ZK-SNARK technologies
- Comparative analysis of neural network models for felling mapping in summer satellite imagery
- Guaranteed estimates of the gamma percent residual life of data storage equipment
- Classification of multiple sclerosis lesion through Deep Learning analysis of MRI images
- Creation and analysis of multimodal corpus for aggressive behavior recognition
- Single images 3D reconstruction by a binary classifier
- Obfuscated malware detection using deep neural network with ANOVA feature selection on CIC-MalMem-2022 dataset
- Switched reluctance motor flux linkage characteristic: experimental approach
- Spectral dependence of photoelecrochemical water splitting by silver nanoporous layers